Match documentation with default model in scripts by shmuelk · Pull Request #615 · llm-d/llm-d-inference-scheduler

shmuelk · 2026-02-12T16:58:01Z

This PR updates the example curl command to use the model now used in the development setups.

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

nirrozenbaum · 2026-02-12T20:34:53Z

/lgtm
/approve

* update llm-d-kv-cache import to v0.5.0-RC1 (llm-d#584) * update kvc version import Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * add go.mod to testable changes Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * Use 1.3.0 CRDs (llm-d#586) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * free disk space on ci-release (llm-d#587) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Match documentation with default model in scripts (llm-d#615) Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Test: LGTM Workflow Automation (#32) * feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config (llm-d#581) * feat: use Tinyllama as the "model" for kind test - in order to test precies-prefix-cache-score we cannot use fool-reviewer since it need call kv-cache-manager to get tokenizer by getting a real model from HF - the change is to switch the "default model" to TinyLlama - also to make tokenizer folder writable need change permission to the USER in Dockerfile - rename dp-epp-config.yaml sim-dp-epp-config.yaml as it is used for local test Signed-off-by: Wen Zhou <wenzhou@redhat.com> * update: revert back some config to keep using prefix-cache-scorer - revert file renaming Signed-off-by: Wen Zhou <wenzhou@redhat.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> * Update linter configuration (llm-d#588) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * fix: config should use new precise-prefix-cache-scorer (llm-d#576) - we have rename prefix-cache-scorer to precise-prefix-cache-scorer in 0.3.0, configs need migrate from the old one to the new one with spec. - rename plugin name - remove parameters.autoTune and parameters.mode: cache_tracking and lruCapacityPerServer - move hashBlockSize, maxPrefixBlocksToMatch under indexrConfig - for config using food-review keep old prefix-cache-scorer - keep pd-epp-config and sim-pd-epp-config with prefix-cache-scorer as KV and PD need both be enabled which is not done yet Signed-off-by: Wen Zhou <wenzhou@redhat.com> * deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 (llm-d#589) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.1 to 1.42.2. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.1...v1.42.2) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.42.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Updated to more recent GIE (llm-d#592) * Updated to more recent GIE Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updated to latest GIE and chnages due to review comments Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added a true mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Exploited mock SchedulerProfile Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * pull kvc v0.5.0 libs (llm-d#595) Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> * deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 (llm-d#596) Bumps [crate-ci/typos](https://github.com/crate-ci/typos) from 1.42.2 to 1.43.0. - [Release notes](https://github.com/crate-ci/typos/releases) - [Changelog](https://github.com/crate-ci/typos/blob/master/CHANGELOG.md) - [Commits](crate-ci/typos@v1.42.2...v1.43.0) --- updated-dependencies: - dependency-name: crate-ci/typos dependency-version: 1.43.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * address nil,nil return linter error in test mock (llm-d#598) Signed-off-by: Etai Lev Ran <elevran@gmail.com> * deps(go): bump the go-dependencies group with 2 updates (llm-d#597) Bumps the go-dependencies group with 2 updates: [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/onsi/gomega](https://github.com/onsi/gomega). Updates `github.com/onsi/ginkgo/v2` from 2.27.5 to 2.28.1 - [Release notes](https://github.com/onsi/ginkgo/releases) - [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md) - [Commits](onsi/ginkgo@v2.27.5...v2.28.1) Updates `github.com/onsi/gomega` from 1.39.0 to 1.39.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.39.0...v1.39.1) --- updated-dependencies: - dependency-name: github.com/onsi/ginkgo/v2 dependency-version: 2.28.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies - dependency-name: github.com/onsi/gomega dependency-version: 1.39.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Models extractor (llm-d#553) * Models extractor Signed-off-by: irar2 <irar@il.ibm.com> * Update register.go Signed-off-by: Ira Rosen <irar@il.ibm.com> * Updated for the newer GIE Signed-off-by: irar2 <irar@il.ibm.com> * Review comments Signed-off-by: irar2 <irar@il.ibm.com> * Check the scheme Signed-off-by: irar2 <irar@il.ibm.com> --------- Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> * feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present (llm-d#509) * feat: implement decode first flow on lmcache connector - if cache_hit_threshold field is present in completion request, then we perform a decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: error handling Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add back todo comment Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: reduce code complexity and duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve header copying Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add comment explaning the cache_hit_threshold field and the new decode first flow Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: enhance logging for cache hit threshold in decode flow - decrease verbosity for common log - add cache_hit_threshold attribute Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve error handling and observability when failing to unmarshal decode response Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add deleted informational comments Signed-off-by: kyano <kyanokashi2@gmail.com> * typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: make error logs more descriptive of the failure reason Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: add cache hit threshold to prefill request so prefill executes regardless of cache condition Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: assign 0 cache_hit_threshold before final decode attempt Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: update comment according to feedback Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: remove istio workaround Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: set cache hit threshold to 0 in prefill request for consistent execution Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: update the log Signed-off-by: kyano <kyanokashi2@gmail.com> * feat: support online decoding Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: preserve request body in lmcache connector Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: support sse format for streamed decode Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: add and improve log descriptions Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typo Signed-off-by: kyano <kyanokashi2@gmail.com> * nit: undo capitalization Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: typos Signed-off-by: kyano <kyanokashi2@gmail.com> * chore: improve error log observability Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate http error checking in function and reuse Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: encapsulate and reuse code better Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: lint error Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: improve code encapsulation and reduce duplication Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename and simplify SSE event signaling logic Signed-off-by: kyano <kyanokashi2@gmail.com> * refactor: rename lmcache to shared storage protocol Signed-off-by: kyano <kyanokashi2@gmail.com> * fix: remove unused function Signed-off-by: kyano <kyanokashi2@gmail.com> * test: e2e tests Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * chore: claude gitignore Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: sim deployment Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * feat: make linter running on new code configurable Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * fix: lint errors Signed-off-by: kyanokashi <kyanokashi2@gmail.com> --------- Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> * Extend support for different ways to decide if disaggregated PD is required (llm-d#531) * Initial step of a configurable pd decider which is responsible for decision whether disaggregation is required, use data added in prefix scorer plugin in PrepareRequestData Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE + fix lint Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update yaml and the test according prefix plugin configuration change (blockSize replaced by blockSizeTokens) Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * code review Signed-off-by: Maya Barnea <mayab@il.ibm.com> * update version of GIE, update prefix_disagr_decider accordingly Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix typo Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix PD for short inputs Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update docs/architecture.md Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/always_disaggr_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update pkg/plugins/profile/prefix_disagg_decider.go Co-authored-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> * updates according the PR comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * create pd decider plugin type with 2 implementations (for prefix based and test always), update deploy configuration according the new structure Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e tests Signed-off-by: Maya Barnea <mayab@il.ibm.com> * changes according the pr comments Signed-off-by: Maya Barnea <mayab@il.ibm.com> * fix e2e test Signed-off-by: Maya Barnea <mayab@il.ibm.com> * add explanation about pd deciders to disagg_pd doc Signed-off-by: Maya Barnea <mayab@il.ibm.com> * rename always_disaggr_decider to always_disagg_decider Signed-off-by: Maya Barnea <mayab@il.ibm.com> --------- Signed-off-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> * chore: fix wrong port for NIXL (llm-d#593) - start with vLLM 0.11.1, default port for NIXL has been updated to 5600 - leave ZMQ to use 5557 Signed-off-by: Wen Zhou <wenzhou@redhat.com> * fix: resolve JSON serialization error in active-request-scorer debug logs (llm-d#602) * fix: resolve JSON serialization error in active-request-scorer debug logs Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * feat: Add raw scores to debug Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> --------- Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Lgtm2 (#17) * Implement "LGTM" ChatOps Workflow. Signed-off-by: Revital Sur <eres@il.ibm.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Revital Sur <eres@il.ibm.com> * test * test: automated LGTM workflow test (#19) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#20) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#21) This PR tests the /lgtm command workflow automation. Test suite: all Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test: automated LGTM workflow test (#22) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#24) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test (#26) This PR tests the /lgtm command workflow automation. Test suite: reset Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * Address review comments. Signed-off-by: Revital Sur <eres@il.ibm.com> * test: automated LGTM workflow test This PR tests the /lgtm command workflow automation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Revital Sur <eres@il.ibm.com> --------- Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> * test Signed-off-by: Revital Sur <eres@il.ibm.com> * test: open-pr Tests that opening a PR triggers gatekeeper which blocks without lgtm label. Test timestamp: 1771188042 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com> Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Wen Zhou <wenzhou@redhat.com> Signed-off-by: Etai Lev Ran <elevran@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: irar2 <irar@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Signed-off-by: kyano <kyanokashi2@gmail.com> Signed-off-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Signed-off-by: kyanokashi <kyanokashi2@gmail.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com> Signed-off-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Maroon Ayoub <maroon.ayoub@ibm.com> Co-authored-by: Shmuel Kallner <kallner@il.ibm.com> Co-authored-by: Wen Zhou <wenzhou@redhat.com> Co-authored-by: Etai Lev Ran <elevran@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: kyanokashi <71283892+kyanokashi@users.noreply.github.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com> Co-authored-by: alberto <aperdomo@redhat.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Match documentation with default model in scripts

2ed9106

Signed-off-by: Shmuel Kallner <kallner@il.ibm.com>

github-project-automation bot added this to llm-d-inference-scheduler Feb 12, 2026

github-actions bot requested review from kfswain and nirrozenbaum February 12, 2026 16:58

github-actions bot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 12, 2026

github-actions bot approved these changes Feb 12, 2026

View reviewed changes

github-project-automation bot moved this to In review in llm-d-inference-scheduler Feb 12, 2026

github-actions bot merged commit cdd8760 into llm-d:main Feb 12, 2026
10 checks passed

github-project-automation bot moved this from In review to Done in llm-d-inference-scheduler Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match documentation with default model in scripts#615

Match documentation with default model in scripts#615
github-actions[bot] merged 1 commit intollm-d:mainfrom
shmuelk:dev-fix

shmuelk commented Feb 12, 2026

Uh oh!

nirrozenbaum commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shmuelk commented Feb 12, 2026

Uh oh!

nirrozenbaum commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants